refactor(shred-network): consolidate threads #980

dnut · 2025-10-02T01:58:04Z

The current structure of the shred network is copied directly from agave. Each pair of [square brackets] represents a thread. In agave they use different names, and some of the components are duplicated on multiple threads in agave where we use just one, but otherwise the architecture is the same. I simply copied agave without thinking about how to improve the design.

[turbine socket listener]-->[packet tagger]--\
                                              |-->[shred receiver]-->[shred verifier]-->[shred processor]-->ledger
 [repair socket listener]-->[packet tagger]--/

This design is pointlessly complicated. It makes the code hard to understand and hard to debug. This is a sequence of steps that need to happen in order, and there's no good reason why they shouldn't happen in the same thread.

I've started to reduce the number of threads here by consolidating the logic from the shred receiver, verifier, and processor into a single thread, and consolidate the packet tagging into the socket threads. Here's the new approach I have in the current PR:

[turbine socket listener]--\
                            |-->[shred receiver]-->ledger
 [repair socket listener]--/

I'd like to also flatten down the socket threads so that this entire thing becomes a single thread, but that's going to require a more complicated rework of our networking code.

I haven't done much benchmarking yet, but the performance appears to be unchanged by this refactor. Before and after this change, sig is able to process about 40 slots worth of mainnet shreds per second on my computer running on my home network.

Future optimizations

If in the future, if the number of shreds skyrockets and we really need multiple threads to handle them, then it will require some more rework, but not a return to the old design. The old design was overly pipelined. If you need more parallelism, you really only need to define two single threaded tasks:

receive the shred from the network, verify them, and collect metadata about them
write the shreds into the ledger

You can parallelize task 1 in a thread pool if necessary. Currently we have task one split into six different threads that all do different things and I don't see any reason to keep this up.

the following threads are all eliminated and the logic is run by the shred receiver thread - two packet handler threads for turbine and repair - shred verifier thread

- unreachable: assume capacity when no capacity is guaranteed - double free: deinit items in a recycled list without clearing the list, then deinit again on the next usage of the list

codecov · 2025-10-02T02:01:46Z

Codecov Report

❌ Patch coverage is 87.09677% with 16 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/shred_network/shred_receiver.zig	85.98%	15 Missing ⚠️
src/shred_network/service.zig	92.85%	1 Missing ⚠️

Files with missing lines	Coverage Δ
src/gossip/service.zig	`88.02% <ø> (+0.19%)`	⬆️
src/net/packet.zig	`100.00% <ø> (ø)`
src/net/socket_utils.zig	`92.92% <100.00%> (+0.06%)`	⬆️
src/shred_network/repair_service.zig	`82.95% <ø> (ø)`
src/shred_network/shred_retransmitter.zig	`33.76% <ø> (ø)`
src/shred_network/shred_verifier.zig	`80.00% <ø> (+61.48%)`	⬆️
src/utils/bitflags.zig	`100.00% <ø> (ø)`
src/shred_network/service.zig	`89.81% <92.85%> (-1.99%)`	⬇️
src/shred_network/shred_receiver.zig	`82.85% <85.98%> (+35.58%)`	⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…iteration

…nsolidate-threads

InKryption

A few suggestions, otherwise I'm a fan of this step forward.

src/net/socket_utils.zig

src/shred_network/service.zig

src/shred_network/shred_receiver.zig

yewman

Nice improvement 👍

dnut added 9 commits August 26, 2025 14:55

perf(shred-network): consolidate redundant threads into shred receiver

bcde750

the following threads are all eliminated and the logic is run by the shred receiver thread - two packet handler threads for turbine and repair - shred verifier thread

shred processor batch limit

2fb7420

Merge branch 'master' into dnut/shred-network/consolidate-threads

c915465

refactor(shred-network): consolidate shred processor into shred receiver

dbda8fa

Merge branch 'master' into dnut/shred-network/consolidate-threads

e404e36

fix(shred-network): remove unused imports

f92a8d3

fix(shred-network): list abuse

73c0699

- unreachable: assume capacity when no capacity is guaranteed - double free: deinit items in a recycled list without clearing the list, then deinit again on the next usage of the list

Merge branch 'master' into dnut/shred-network/consolidate-threads

e8cf569

fix: style

73e71e7

github-project-automation bot added this to Sig Oct 2, 2025

github-project-automation bot moved this to 🏗 In progress in Sig Oct 2, 2025

dnut and others added 4 commits October 3, 2025 15:56

refactor(shred_network): separate shred receiver initialization from …

b893311

…iteration

move socket threads into run function

b098e33

add handleBatch test - repair flow

51d1bfb

add handleBatch - shred flow

1e14a76

dnut requested a review from InKryption October 7, 2025 21:33

dnut marked this pull request as ready for review October 7, 2025 21:33

dnut requested review from ultd and yewman as code owners October 7, 2025 21:33

dnut requested a review from Sobeston October 7, 2025 21:33

Merge remote-tracking branch 'origin/main' into dnut/shred-network/co…

676a3c6

…nsolidate-threads

InKryption reviewed Oct 8, 2025

View reviewed changes

src/net/socket_utils.zig Outdated Show resolved Hide resolved

src/shred_network/service.zig Outdated Show resolved Hide resolved

src/shred_network/shred_receiver.zig Outdated Show resolved Hide resolved

src/shred_network/shred_receiver.zig Outdated Show resolved Hide resolved

kprotty added 3 commits October 9, 2025 15:07

fix ping/pong rename

a84f124

add Packet.Flags to all SocketThread spawns

1893404

formatting

57328d7

InKryption approved these changes Oct 9, 2025

View reviewed changes

dnut enabled auto-merge October 9, 2025 21:10

yewman approved these changes Oct 13, 2025

View reviewed changes

dnut added this pull request to the merge queue Oct 13, 2025

github-project-automation bot moved this from 🏗 In progress to 👀 In review in Sig Oct 13, 2025

Merged via the queue into main with commit 279ff14 Oct 13, 2025
18 checks passed

dnut deleted the dnut/shred-network/consolidate-threads branch October 13, 2025 09:29

github-project-automation bot moved this from 👀 In review to ✅ Done in Sig Oct 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(shred-network): consolidate threads #980

refactor(shred-network): consolidate threads #980

Uh oh!

dnut commented Oct 2, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 2, 2025 •

edited

Loading

Uh oh!

InKryption left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yewman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

refactor(shred-network): consolidate threads #980

refactor(shred-network): consolidate threads #980

Uh oh!

Conversation

dnut commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Future optimizations

Uh oh!

codecov bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

InKryption left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yewman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dnut commented Oct 2, 2025 •

edited

Loading

codecov bot commented Oct 2, 2025 •

edited

Loading